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Monotonic Entropies 


Dan Simovict! 


Abstract 


We introduce an axiomatization of entropy that generates, as spe- 
cial cases, novel entropy types. These entropies generalize Shannon’s 
entropy and allow the introduction of entropy for partitions of sets of 
objects located in metric spaces, and for partitions of sets of vertices in 
undirected graphs. Corresponding metrics on the sets of partitions are 
introduced. Also, we hint to applications of these metrics for evaluation 
of clustering quality. 
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1 Introduction 


The notion of entropy, the cornerstone of information theory, was introduced 
by Claude Shannon in his 1948 double paper [16, 17], as a limit of lossless 
data compression in a noiseless data transmission channel. There exists 
an ample literature containing axiomatizations of the notion of entropy 
for probability distributions. Some of these axiomatizations involve the 
Shannon entropy [6, 8, 11, 14]. Others, such as [24, 23, 7, 9, 20, 22], focus 
on generalizations of entropy. 

This note presents an axiomatization of entropy that leverages algebraic 
properties of sets of partitions of finite sets in order to produce a simpler 
system of axioms for entropy, and to extend this notion to a diverse collection 
of data types. 
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Partitions are fundamental for clustering algorithms which aim to 
detect groupings of objects that have similar properties or are geometrically 
close to each other. There is a vast literature (see [18]) that focuses on 
clustering algorithms and a great diversity of approaches to clustering. Also, 
evaluating cluster quality is an important and challenging task for comparing 
appropriateness of clustering algorithms for various object configurations. 


Partitions of finite sets constitute a natural framework for studying 
non-overlapping clusterings. The metrics of the partition space of a set 
of objects generated by various types of entropies offer an instrument for 
assessing the quality of clusterings. In many cases, data that is subjected 
to clusterings is labeled and one natural way of grouping object is using 
these labels and place objects with the same label in a cluster. On the other 
hand, objects could be grouped using their attributes using a variety of 
clustering techniques and the metric space of partitions offers a methodology 
of comparing naturally defined partitions (generated by object labels) with 
partitions produced by clustering algorithms and, thus, assess the efficacy of 
these algorithms. 


The notion of entropy is usually defined for probability distributions. 
A finite probability distribution presents itself as a n-tuple of non-negative 
numbers p = (p1,.-.,Pn) that satisfies the condition py + --- +p, = 1. 
Its Shannon entropy is then defined as H(p) = >>}, p; log > and plays a 
fundamental role in the study of information transmission. 


In this note we adopt a related but distinct approach to entropy, by 
defining several types of entropies for partitions of finite sets instead of 
probability distributions. The advantage of this approach is the possibility 
of using the properties of the partial ordered set of partitions of a finite set. 
Thus, we are able to formulate analogues of entropy suitable for partitions 
of metric spaces, or partitions of the sets of vertices of finite graphs. 


An central concept in this paper is the notion of monotonic function. 
If (P,<) and (Q, <) are two partial ordered sets, a function f : P —> Q is 
monotonic ifx <y, x,y € P, imply f(x) < f(y). Monotonic functions will 
serve to define three distinct types of entropies. 


In Section 2 we review some elementary definitions and properties of 
partitions. Section 3 focuses on using numerical monotonic functions defined 
on sets on partitions to introduce a set of three axioms that characterize three 
different types of partition entropies involving partitions of unstructured 
sets, partitions of subsets of metric spaces and graph partitions. Conditional 
monotonic entropies are discussed in Section 4. These entropies induce 
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metrics on various types of partitions, which we examine in Section 5. The 
final Section 6 presents our conclusions and some ideas for future work. 


2 Partitions 


The set of subsets of a set S is denoted by P(S). 

Formally, a partition of a set S is a collection of non-empty subsets 
of S referred to as blocks, 7 = {B; | i € I, B; C S} such that 1,7 € J and 
i # j implies B;M B; = 0 and Uj-,; Bi = S. If 7 consists of two blocks, 
m7 = {B,, Bo}, we refer to 7 as a bipartition. The set of partitions of a set S 
is denoted by PART(S). The notation PART p;n(S) is reserved for the set of 
finite partitions of S. Of course, when S is finite, PART in(S) = PART(S). 

If 7 € PART(S) and z,y € S belong to the same block of 7 we 
write « = y(m). The relation “=” is reflexive, symmetric, and transitive 
and, therefore, it is an equivalence relation on S. Conversely, if p is an 
equivalence of S, the sets of the form [xz], = {u€ S | (a, u) € p} constitute 
a partition 7, of S. 

A partial order “<” is defined on partitions in PART(S) by setting 
m™ < if each block of z is included in a block of 0. The partition ag = {{z} | 
x € S} is the least element of the partially ordered set (PART(S),<), while 
the single-block partition ws = {S} is the largest element of (PART(S),<). 

If 7,0 € PART(S), m < o, and there is no partition 7 € PART(S) — 
{z,o} such that 7 < 7 < 0, then we say that o covers 7 and we write 7 <Jo. 
It is easy to show that a <o if and only if o is obtained from 7 by fusing 
two of the blocks of 7 (see [19]). 

Let U,V be two non-empty, disjoint sets, and let 0 € PART(U), and 
Tt € PART(V), where o = {Bi,...,Bm} and 7 ={C1,...,C,}. The sum of 
the partitions o and T is the partition 0 + tT € PART(U UV) defined as: 


o+T= 4 Bia 55.4 Epes Cay anes Orn bs 
For every two non-empty disjoint sets U and V we have: 


AU Tray = AUUV; 
wy twy = {U,V} € PART(UU UV). 


Furthermore, if U,V,W are non-empty disjoint sets, a € PART(U), 
T € PART(V) and v € PART(W), we have 


o+(r+v)=(04+7) +, 
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a property referred to as the restricted associativity of partition addition. 
The term “restricted” refers to the fact that the underlying sets U, V,W are 
supposed to be disjoint. 

Ifo = {B,...,Bm} € PART(S) then we have: 


oO =Wp, +++: +WB,,- 


If the set S consists of a single element, S = {s}, then ag = wg = {s}. 
The algebraic structure of sets of partitions as semi-modular lattices is 
discussed in the classical reference [4]. 


3 Axiomatization of Monotonic Partition Entropy 


Our axiomatization of partition entropies starts with monotonic functions 
defined on sets of partitions. We present three examples of monotonic 
functions defined on specialized collections of sets that will allow us to 
generate a variety of entropy types. 

Let pp : P(S) —> Rso be a non-negative monotonic function of sets, 
that is, a function such that U C V implies w(U) < w(V) for U,V € P(S), 
and |U| > 1 implies u(U) > 0. 

Next, we consider examples of non-negative monotonic functions that 
generate corresponding entropies. 


Example 1. Let S be a finite set and let  : P(S) —>+ Rso be given by 
u(B) = |B\P for B € P(S) and some 8 > 0. The function is clearly 
monotonic and B #() implies (B) > 0. 

Furthermore, if |B| =1, then u(B) = 1. 


Example 2. Let W = {a1,...,%m} C R” be a finite set and let d be a 
metric on R". Define the centroid of W as cw = a eee 
The sum of square errors of the set W is defined as: 


m 


sse(W) =) d*(ai,ew) =D) |e? -1W| || ew |. 


z= 1 zew 


If W is a finite subset of R” and o = {U,V} is a bipartition of W a 
straightforward computation yields: 


sse(W) = sse(U) + sse(V) 4 | eu — ev ||’, 
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which implies 
sse(U) + sse(V) < sse(W). (1) 


Note also that U,W are two finite subsets of R” such that U C W, 
we have sse(U) < sse(W), which shows that sse : Prin(R") —> Rso ts a 
monotonic function. Furthermore, if |W| =1, then sse(W) = 0. 

Another function that can be defined on finite subsets of (R",d) is the 
diameter diam : P(R”) —+Rso, given by diam(W) =max{d(a, y) | 2, ye W}. 
It is immediate that diam is monotonic. 

Example 3. Let G = (V,E) be a connected loop-free finite graph having V 
as its set of vertices and E as its set of edges. For a set of vertices B define 
int(B), the set of internal edges of B as 


int(B) = {{r,y} © E | {x,y} © B}. 
This definition is extended to partitions of sets of vertices by defining 
int(r) = |) int(B). 
Ben 
The set int(7) is the set of internal edges of 7. 


If 7,0 € PART(V) then int(a A co) = int(m) MN int(c). 
The set ext(7) of external edges of 7 (also known as cut edges of 7) 
consists of edges that join vertices in distinct blocks and is given by: 


ext(7) = E — int(z). 
Thus, we have: 
ext(t Ac) = E-—int(a Ao) 
E — (int(z) Nint(c)) 
(E — int()) U (EB — int(o)) 
= ext(7) Uext(o). 


Note that 
int(ay) = 6, ext(ay) = BE, 
int(wy) = &£, ext(wy ) oe 
for every graph G = (V, E). 
It follows from the above discussion that the function int : PART(V) —> 
P(E) is monotonic, while ext : PART(V) —> P(E) is dually monotonic. 
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Therefore, the function fing : PART(V) —> Rso defined as pins(B) = 
|int(B)| is a monotonic function. 

Starting from monotonic functions of sets we introduce a set of three 
axioms that define an entropy associated to these functions. 


Definition 4. Let S be a non-empty set and let w : P(S) —> Ryo be a 
non-negative monotonic function defined on the subsets of S. A p-entropy 
is a function H, : PART(S) — Rso that satisfies the following conditions: 


e (Ao)-initialization axiom: For any set S, Hy(ws) = 0. 


e (A,)-monotonicity axiom: If 7,0 € PART(S) and m < o, then 
Hylt) > H,(0). 


e (A2)-addition axiom: For every finite disjoint subsets U,V of a set S 
such that S=UUV, 0 € PART(U) andr € PART(V) we have: 


wie = ) _ BV) 
te = MU UV) i i(U UV) 


Hy(t) + Hy({U, Vf). 


Note that if H,, is a function on the partitions of S that satisfies the 
above axioms then for any positive a, aH,, also satisfies the axioms. 


Lemma 5. /f|5|=1, then H,(ag) = 0. 


Proof. This follows from the fact that for a singleton set S = {a} we have 


ag =Ws. 


Lemma 6. Let U,V be two non-empty, finite disjoint sets, 1: P(UUV) — 
Rso be a positive monotonic function of sets, and let o be a partition of the 
set U. Then, 


Hylo + avy) =Hylo + wy) + ji Oe Hyl(av). 


(UUV) 


Proof. By Definition 4 we can write: 


Hulo+ ay) = ERY piy(o) + Hiya) + Hyl{UV}) 
Helo tv) = ET aya) + Hyl{UV) 


The equalities imply the desired result. 
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Theorem 7. Let S be a set such that |S| > 2 and let r = {B,,..., Bm} be 
a partition of S. For any non-negative monotonic function u:P(S) — Rso 


we have: 
m 


Hyln) = Hylas) — A 
i=1 


Proof. Since 7 = wp, + wp, +++: +Wp,, we can consider the descending 
sequence of partitions of the set S: 


H,,(aB;)- (2) 


TO = WB, +B, +++ + WB, = 7 
TT, = QB, TWBy +*** + WB, 
T2 = QB, TAB, T+ WB, 
Tm = Ap, +p, +:+:: +B, = ag. 
Define 6; = ag, +-::+ 0B, + wp. +-:: +p, € PART(S — Bj41) for 


1<i<m-—l. Note that 
1 = 07 + wp,,, and M41 = 01 + OB, 4, 


are both partitions of the set S. By Lemma 6 we have: 


(Bi) 


Hy(m1) = H,,(70) al u(S) Hyu(op,), 
Hylma) = Hylm) + 44 an,) 
Hy(tm) = Hy(tm—1) + Be Hy any) 


Therefore, 


Hy(%™m) = Hy (m0) + » na H,,(oz;) 
i=l 


Equivalently, since 7 = ag, we gave 


Hy(m) = Hy(as) — a ED es) 
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Corollary 8. Let S be set such that |S| > 2. For any non-negative monotonic 
function u:P(S) —> Rso and any partition 7 = {B,,..., Bn} € PART(S) 


we have: 
m 


Hyu(as) > S> ne 
A 


Proof. By the initialization and monotonicity axioms 7 < wg imply H,,(7) > 
H,,(ws) = 0, hence the p-entropy of any partition is non-negative. This fact 
combined with Theorem 7 yields the desired result. 


Hy (ag; ). (3) 


Example 9. Let (S$) =|S|° for any finite and non-empty set S and 8 > 0 
and let 


Le! S|? 
Hes) aoe 
By Theorem 7 this choice of Hy(as) implies: 


1 BIP 
Hylt) = 3 (: oy at) | 


Ben 


which is the Havrda-Charvat generalized entropy obtained in [7]. 
Note that 


by a straightforward application of Hospital rule. 

If m < o the axiom (Aj) is satisfied. It suffices to show that 7 da 
implies Hylan) > Hylo), so let 7 = {Bi,..., Bm, Bm-1,Ba} and let 
o0 = {B,...,Bm—2, Bm-1U Bm}. These choices imply: 


= 1 ~ | Bil? 
Hult) = —orr (: » aa 
m—2 
_ 1 | B;|F [Breast U Bal? 
Halo) = Fo9r8 (: a [S| ise 


and the aziom (A1) is satisfied because 


Baal? + |[Brl? < Brac U Bil 


The special case 3 = 2 yields 


m 42 
H,,(m) = 2 ( -S- a 
all 
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which is the double of the Gini index. 
By applying UV’Hospital rule we obtain: 


— wo lBil, [Bil 
pa PS is] = TSI 


which is the Shannon entropy. 


Example 10. Let ys be the positive monotonic function introduced in Exam- 
ple 2, u(B) = sse(B), where B is a finite subset of R". Choose H,(ay) = 1 
for every finite set U € P(S). Then, the p-entropy is: 


Ayla) =1— 
i=1 


sse( B;) 
sse(S') ’ 


which is the expression of the inertial entropy of a partition studied in [21]. 
The satisfaction of axiom (A) follows from Inequality (1). 
With the alternative choice, u(B) = diam(B) we obtain the entropy 


Example 11. Let G = (V,E) be a connected loop-free finite graph and 
let Ugr(B) = |int(B)| for every set of vertices B be the function defined in 
Example 3. By choosing H, (ag) = 1, the expression of the u-entropy of a 
partition m = {Bi,...,Bm} € PART(V) is: 


I 
= 
| 


Fl ipa: (m7) 


Example 12. A subset C of the set of vertices V of a graph G = (V, FE) is 
a clique if int(C) =C x C. 

The clique-partitioning problem of a graph G = (V,F) seeks to find 
a partition 7 € PART(V) that consists of cliques such that each vertex is 
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contained in exactly one clique. Let k = {C),...,Cy} be a clique partitioning 
of G=(V,E). The maximum number of edges of G is ACE, Therefore, 


VIIVI—-1) _ k(A 1) 
2 2 


|E| < 


because for each pair of cliques C,C' at least one edge should be absent between 
the vertices of these cliques in order to avoid eliminating that pair of cliques 
by consolidating C and C" into one clique. This inequality was established 
in [3], where it is noted that the upper bound Oupper of the minimum number 
of cliques is 


1+ /4[V|? —4|V| — 8/£/ +1 
Oipper = 5 : 
It is shown that Oupper is an optimal bound, which means that for each |V| 
and |E| there exists a graph that has Oupper cliques. 
The size of the set of internal edges of the cliques is: 


k k 
Jint(x)| = 5° lint(C;)| => GMI) 
i. : i=1 
— (sier = mi) 
i=1 


Thus, if t= {Ci,...,Cy} is a clique partition of the graph G, its entropy is: 


Xt ( Ki: 1 Cil\(ICi| — 1 
H,,.(n) = ett (Ie > Ica ? 


k 
1 sola? MI 
= E : 
|E| ( | =e 2 


Since ~ |Ci| = |V|, the entropy Hy,,(K) is maximal, when the sizes of 
the cliques are approximatively equal. 


Elementary properties of partition cut-sets of graphs allow us to obtain 
the necessity of axiom A» for graph entropies. Indeed, let « = {U,W} be 
a cut in the graph G and let 0 € PART(U) and 7 € PART(W) be two 
partitions of the sets U and W. The partition o +7 of V consists of all 
blocks of o and all blocks of T. 
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An external edge e of partition 0 +7 may fall in one of the following 
pairwise disjoint sets: 


e e is an external edge of o but an internal edge of K; 
e e is an external edge of 7 but an internal edge of k; 
e eis an external edge of k. 
Since the sets ext(o), ext(7), and ext(«) are disjoint we have: 
ext(o + 7) = ext(a) U ext(T) U ext(«). 
The last equality implies 


jext(a+7)| — |U| Jext(o)| if |W| |ext(7)| — Jext(«)| 
IV IV] |U| IV] |W IV] - 


When this equality is expressed using the graph entropy we recover axiom 
Ao, namely: 


u(U) 


Hu(o +7) = AU UW) 


Hylo) + = MV) at, (r) + Hy ({U,W)). 


(UUW) 
A graph G = (V, E) is bipartite if there exists a bipartition 7 = {V;, V2} 
such that ext(7) = EF. This is equivalent to the existence of a bipartition 7 
such that H,,(7) = 1. In general, a graph G' = (V, F) is k-colorable, if it has 
a partition 7 = {B,,..., B,} such that if {z,y} © FE, then x and y belong 
to two distinct blocks of 7. In other words, G is k-colorable if and only if 
there exists a partition of V having k blocks such that H,,(7) = 1. 
Since the graph k-coloring problem is known to be NP-complete (see [10]), 
it follows by direct transformation, that the problem of the existence of a 
partition with k blocks of the set of vertices of a graph and has monotonic 
entropy equal to 1 is NP-complete. 


4 Conditional Monotonic Entropy 


Let 7 = {Bi,...,Bm} € PART(S) and let C C S. The trace of 7 on C is 
the partition 


to ={BNC | Benand BNC FO} € PART(C). 
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Definition 13. Let 7,0 € PART(S), where o = {Ci,...,Cn}. The p- 
conditional entropy of x and o is given by: 


yo HC) 
Hy(a\|c) = = 8) ul C;) 
Note that H(z|ws) = H(n), 
Mluslo) = So Hee HAC), 


and H,,(m|ag) = 0 for every 7 € PART(S). 
Theorem 14. For any two partitions 7,0 € PART(S) we have: 
Hula Ao) =Hy(t\o) +Hy(o). 


Proof. For 7 = {Bi,...,Bm} and o = {Ci,...,Cn} in PART(S) the condi- 
tional entropy can be written as: 


Hyams) = >> MDa ey) 


= Ee (toed -F egyPmlone) 


Corollary 15. Let 7,0 € PART(S), where S is a finite set. We have 


H(t Ao) =H,(t\o) +Hy(o) = Hy(olr) + Hy (7). 
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Proof. This is a direct consequence of Theorem 14. 


The following corollary is immediate: 


Corollary 16. For 7 = {B,,...,Bm} ando = {C,...,Cn} in PART(S) 
we have 
Hullo) < Hylan Ao) 


and 
Hy(t) < Hy(alo) + H(o). 


Theorem 17. Let 7,0 € PART(S) be two partitions of a finite set S. We 
have H,,(a|o) = 0 if and only ifo <1. 


Proof. Suppose that o = {Cj,...,Cnr}. Ifo < 7, then mo, = wo, and, 
therefore, 


Conversely, suppose that 


Hylnlo) = > 4G) 


“=z HAS) 


Hy(mc,) = 0. 


This implies H,,(™c,) = 0 for 1 < 7 <n, which means that tc, = wo, for 
1 <j <n. Therefore, each block Cj of o is included in a block of 7, so 
OKT. 


We will show that the conditional monotonic entropy is dually monotonic 
with respect to its first argument and monotonic with respect to its second 
argument. 


Theorem 18. Let 7,0,0' € PART(S), where S is a finite subset of R?. If 
a <o', thenH,(o|7) > Hy(o'|r), and Hy(t\o) < Hy(ao’). 


Proof. Since o < o’ we have tAa < mAo',s0H,(mAc) > Hypa Ao’). 
Therefore, H,,(o|7) + Hy(7) > H,(o'|7) + H,(7), which implies H,,(o|7) > 
Hy(o' |r). 

For the second part, it suffices to prove the inequality for partitions 7 
and o’ such that o is covered by o’. Without loss of generality assume that 


o= {Ci, bats C49 One Cn} and o’ = {Ci, in Opn. Cpa U Cah 
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We have 
n—2 : 
Helalo!) = Yo ERB pty (me,) + MO ay, toy suc) 
j=l 
n—2 
H(Ci)a, cy 4 MOn-1) 4) (7 HCn)ay on 
Dey ey NGA ay 
(by the Addition Axiom) 
= H,(nI\o). 


Theorem 19. Let 2,0,7 be three partitions of the finite set S, where S C R?. 
We have: 


Hyi(tlo AT) +H,(o|r) = H(t A or). 
Proof. By Corollary 15 we have 


HialeAT) = Hua nent) =Hele 7), 
H,,(o|7) 


II 
ae 
Say 
Q 
> 
ay 
| 
a 
i 
a 


By adding these equalities we have 


Hyulalo AT) + Hylolr) =Hy(a Ao AT) —Hy(7). 


A further application of Corollary 15 yields the desired equality. 


Theorem 20. Let 7,0,7 be three partitions of the finite set S, where S C R?. 
We have 


Hy(atlo) +Hy(e|r) > Hp(a|7). 


Proof. The monotonicity of the conditional inertial entropy in its second 
argument and the anti-monotonicity of the same in its first argument allows 
us to write: 

Hy(a|o)+ Hylolr) 2 Hylrlo Ar) + H,(o|7) 
(7 Ao|r) 2 Hy (a7), 


Hy 
Hy 


which is the desired inequality. 
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Corollary 21. Let 1,0 be two partitions of the finite set S, where S C R?. 
We have 
Hila Vo) +Hy(a Ao) < Hy(r) + Hy(o). 


Proof. By Theorem 20 we have H,,(7|o) < H,(a|7T) + H,(7|o). Replacing 
the conditional inertial entropies we obtain 


Hy(m Ao) —Hylo) > Ayla Ar) —Hy(7) + Hy(7 Ao) — Hye), 
which implies 


Hyl(t) +Hy(a Ac) < Hypa At) + Hy(r Ao). 


Choosing tT = 7 Va yields the desired inequality. 


5 Metrics on Partitions Induced by Monotonic 
Entropies 


Conditional monotonic entropy induce metrics on spaces of partitions as 
we show next. These properties of these metrics generalize previous results 
obtained by Lopez de Mantaras in [5]. 


Theorem 22. The mapping d,, : PART(S)? —>+ Rso defined as 
dy(m,o0) = Hy(m\o) + Hy(ol|7) 

is a metric on PART(S). 

Proof. A double application of Theorem 20 yields 


Hy(m|o) + Hy(olr) 
H, (ol) + Hy (to) 


Hyu(zl|7), 


2 
2 H,(7\7). 


Adding these inequalities gives the triangular inequality for d,: 
dy(w,0) + dy(o,7) 2 dy(m,7). 


The symmetry of d,, is immediate and it is clear that d,,(7,7) = 0 for every 
nr € PART(S). 

Suppose now that d,,(7,a) = 0. Since the values of H,, are non-negative, 
this implies H,,(7|o) = H,(o|7) = 0. By Theorem 17, we have both o < 7 
and m <o,so 7 =a. Thus, d,, is a metric on PART(S). 
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Example 23. The Rand distance of two partitions 7,0 € PART(S) is 
the number rd(x,o) of unordered pairs {x,y} of elements of S such that 
there exists a block in one partition containing both x and y but x and y 
are in different blocks in the other partition (see [15]). For example, if 
S = {1,2,3}, 7 = {{1,2},{3}}} and o = {{1},{2,3}}, then rd(n,o) = 2 
(the pairs involved being {1,2} and {2,3}). In particular, if m € PART(S), 
then rd(7,wgs) equals the number of pairs (x,y) such that x # y(m). 

Actually, the Rand distance is a multiple of the metric d,,, where (U) = 
|U|? for U € P(S). Indeed, let r = {B,,..., Bm} and a = {C\,...,Cn} be 
two partitions in PART(S). We have 


Hylalo) = Hula Ao) —H,fe) 


7 Ga BI CP - I G3)? 
i=1 j=1 
The expression >" IC; |? - oy se |B;C;|? equals the number of pairs 
of elements that belong to the same block of a but to distinct blocks of wv. 
The similar expression \\y"4 |Bil? — y21 oj=1 |Bi NC)? gives the number 
of pairs of elements that belong to the same class of 7, but to two distinct 
classes of a, and, therefore, rd(x,o) is a multiple of dy. 
Lemma 24. Let 1,0,7 be three partitions in PART(S). We have: 
Hulme) ~ Hulolt) 5 Hyltlr) 
Hi(tmAo)  HyloArt)~ Hult Ar)’ 
Proof. By applying the definition of conditional entropy we can write: 
Hy(tlo)  Hylolr) 
WATAG)  Fiylaht) 
_ Halal), 
Hy(mlo)+Hy(o) Hy(olr) + Hyl(r) 
(by Theorem 14) 
Hyl(al|o) Hyl(olr) 


7H, (alo) + Hylolr) + Hylt) + Hylalo) + Hylolr) + Halt) 
(by Corollary 16) 
Ayre) + Halal) 
Hylan) + Hylelt) + Ay) 
5 Malte) Hull) 


Hy(t|r) + Hy(7) Hy(m Ar)’ 
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which is the desired inequality. 


Theorem 25. The mapping 5, : PART(S)? —+ Rso defined as 


=, dyu(7,o) 
Ou(T, 7) = H,, (0 A a) 
is a metric on PART(S) such that 0 < 6,(7,0) <1 for 2,0 € PART(S). 


Proof. The non-negativity and the symmetry of 6, are immediate. To prove 
the triangular axiom we write: 


dy,(7,T) 
6. (7, 7) = y(n AT) 
_ Hyatt) + Hulzlr) 
Fite) 
- Hultlo) | Ay(olr) 
~ H(tAo)  Hylo Ar) 


Hy(t|o) — Hylolz) 

Hy(tAc)  Hy(oAn) 
(by Lemma 24) 

— bu(7, 0) + du(o, 7), 


Furthermore, since H,,(a|o) < H,(aAc) it follows that 0 < 6,(7,0) < 1. 


For 7,0 € PART(S) we have: 


Hylm) + Hula) 


ae H(t Ao) 


Example 26. For the graph-related entropy introduced in Example 11 the 
distance 6, 18 given by: 


lext(m)| + |ext(o)| 
jext(t Ac)| 


Onl, a) = 


Example 27. We consider an analogue of the Rand distance between par- 
titions of sets of edges in undirected graphs that can be introduced using 
our approach. Using the notations from Example 3, let 7,0 € PART(V), 
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where V is the set of vertices of a graph G = (V, E); the graph Rand distance 
d(7,0) between these partitions is: 


lext(7)| + lext(o)| 


a lext(m A o)| 
_ |int(a) M ext(a)| + |int(o) M ext(7)| 
jext(z A o)| : 
because 
ext( \ a) — ext(m) = int(a)/M ext(c) 
ext(7 A a) — ext(o) = int(o) M ext(z). 


6 Conclusion 


We introduced monotonic entropy as a generalization of Shannon entropy and 
formulated a system of three axioms that depend on a numeric monotonic 
function defined on the set of partitions of a finite set. By specializing this 
function we recapture as a special case the Shannon entropy. Furthermore, 
this axiomatization allows us to extend the notion of entropy to partitions 
of sets of objects that posses special properties (such as being embedded in 
a metric space, or being defined by partitions of undirected graphs). 

We show that the notion of conditional entropy defined for the newly 
axiomatized types of entropy allows us to introduce certain metrics on 
partitions generalizing the results obtained in [5]. 

Since clustering can be regarded as partitions, the new metrics structure 
will allow us to apply our results in the study of stability of clustering 
algorithms (see [1, 2, 12, 13]) and in the external validation of clusterings, 
where an apriori data labeling can be compared with the results of clustering 
algorithms. 
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